Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checker(dm): check the number of connections before starting #5185

Merged
merged 30 commits into from
Jul 29, 2022

Conversation

buchuitoudegou
Copy link
Contributor

@buchuitoudegou buchuitoudegou commented Apr 15, 2022

What problem does this PR solve?

Issue Number: close #5005

What is changed and how it works?

  • add new checker to check the number of connections

Check List

Tests

  • Unit test
  • Integration test

Code changes

Side effects

Related changes

Release note

`None`

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Apr 15, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lance6716
  • lichunzhu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-2.0.5 needs-cherry-pick-release-2.0.6 needs-cherry-pick-release-2.0.7 needs-cherry-pick-release-6.0 Should cherry pick this PR to release-6.0 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 15, 2022
@buchuitoudegou buchuitoudegou added the area/dm Issues or PRs related to DM. label Apr 15, 2022
@buchuitoudegou
Copy link
Contributor Author

Will add ut later

@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2022

Codecov Report

Merging #5185 (c48b5cb) into master (6290709) will increase coverage by 0.0419%.
The diff coverage is 57.7092%.

Flag Coverage Δ
cdc 66.0872% <65.6862%> (-0.0258%) ⬇️
dm 52.1090% <51.2000%> (+0.0727%) ⬆️
engine 63.8494% <ø> (+0.1077%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #5185        +/-   ##
================================================
+ Coverage   59.4214%   59.4633%   +0.0419%     
================================================
  Files           763        767         +4     
  Lines         87418      87760       +342     
================================================
+ Hits          51945      52185       +240     
- Misses        30894      30969        +75     
- Partials       4579       4606        +27     

Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restLGTM

if len(c.stCfgs) > 0 && len(c.instances) > 0 {
switch c.stCfgs[0].Mode {
case config.ModeAll:
c.checkList = append(c.checkList, checker.NewLoaderConnAmountCheker(c.instances[0].targetDB, c.stCfgs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this check is somthing about connection count, do we need to put this item in the head of the check queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checker list is parallel. So it doesn't matter.

for i, checker := range checkers {
wg.Add(1)
go func(i int, checker RealChecker) {
defer wg.Done()
result := checker.Check(ctx)
result.ID = uint64(i)
resultCh <- result
}(i, checker)
}

if len(c.stCfgs) > 0 && len(c.instances) > 0 {
switch c.stCfgs[0].Mode {
case config.ModeAll:
c.checkList = append(c.checkList, checker.NewLoaderConnAmountCheker(c.instances[0].targetDB, c.stCfgs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checker list is parallel. So it doesn't matter.

for i, checker := range checkers {
wg.Add(1)
go func(i int, checker RealChecker) {
defer wg.Done()
result := checker.Check(ctx)
result.ID = uint64(i)
resultCh <- result
}(i, checker)
}

return result
}
var rows *sql.Rows
rows, err = baseConn.QuerySQL(tcontext.NewContext(ctx, log.L()), "SHOW VARIABLES LIKE '%max_connections%'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reuse func GetMaxConnections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But GetMaxConnections has no retry strategy. Users have to retry by themselves if connection errors occur (#5015).

Actually, there are a bunch of different pkgs used for querying SQL, varying from implementation and retry strategy. BaseConn is good to have decent retry strategy as well as encapsulation. Should we refactor and integrate them as a whole package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, @lance6716 is planning to refactor the syncer. Idk if this could be part of the plan cuz it seems like a big one to do.

@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 18, 2022
Copy link
Contributor

@Ehco1996 Ehco1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 18, 2022
Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a checker for dumpling?

func NewLoaderConnAmountCheker(targetDB *conn.BaseDB, stCfgs []*config.SubTaskConfig) RealChecker {
return &LoaderConnAmountChecker{
connAmountChecker: newConnAmountChecker(targetDB, stCfgs, func(stCfgs []*config.SubTaskConfig) int {
loaderConn := 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loader's checkpoint db will use some connections. Does this part need to be calculated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every unit maintains 1 db connection for checkpoint. I added it and left a comment. PTAL again😀

dm/pkg/checker/conn_checker.go Outdated Show resolved Hide resolved
@buchuitoudegou
Copy link
Contributor Author

/run-verify

@buchuitoudegou
Copy link
Contributor Author

/run-dm-integration-test

1 similar comment
@buchuitoudegou
Copy link
Contributor Author

/run-dm-integration-test

dm/dm/config/checking_item.go Outdated Show resolved Hide resolved
syncerConn := 0
for _, stCfg := range stCfgs {
// syncer's worker and checkpoint (always keeps one db connection)
syncerConn += stCfg.SyncerConfig.WorkerCount + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe syncer has a DDL connection?

@@ -142,6 +145,39 @@ function run() {
fi
}

function test_full_mode_conn() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the actual connection is same as you expected by SHOW PROCESSLIST

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean I should start-task and see if they are as expected?

Copy link
Contributor

@lance6716 lance6716 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. if you do this check, you can find the forgotten checkpoint / DDL / dump control connections on your own.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 19, 2022
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong action

@lance6716
Copy link
Contributor

https://ci2.pingcap.net/blue/organizations/jenkins/dm_ghpr_integration_test/detail/dm_ghpr_integration_test/8315/pipeline

sequence_sharding_optimistic failed

/run-dm-integration-test

please open an issue for them. I'm afraid we will forget them.

@buchuitoudegou
Copy link
Contributor Author

https://ci2.pingcap.net/blue/organizations/jenkins/dm_ghpr_integration_test/detail/dm_ghpr_integration_test/8315/pipeline
sequence_sharding_optimistic failed
/run-dm-integration-test

please open an issue for them. I'm afraid we will forget them.

#6515

@buchuitoudegou
Copy link
Contributor Author

@buchuitoudegou
Copy link
Contributor Author

@buchuitoudegou
Copy link
Contributor Author

/run-dm-integration-test

@buchuitoudegou
Copy link
Contributor Author

@buchuitoudegou
Copy link
Contributor Author

@buchuitoudegou
Copy link
Contributor Author

/run-dm-integration-test

@lance6716
Copy link
Contributor

/run-leak-test

@buchuitoudegou
Copy link
Contributor Author

@buchuitoudegou
Copy link
Contributor Author

checkpoint_transaction

[2022-07-29T07:07:41.645Z] [2022/07/29 15:04:00.319 +08:00] [ERROR] [baseconn.go:184] ["execute statement failed"] [task=sequence_sharding_removemeta] [unit=load] [query="CREATE TABLE `t_target` (`id` bigint NOT NULL AUTO_INCREMENT,`uid` int DEFAULT NULL,`name` varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,`c` int DEFAULT NULL,`d` int DEFAULT NULL,`e` int DEFAULT NULL,PRIMARY KEY (`id`),UNIQUE KEY `uid` (`uid`),KEY `c` (`c`),KEY `d` (`d`),KEY `e` (`e`)) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;"] [argument="[]"] [error="Error 1050: Table 't_target' already exists"]

@lance6716
Copy link
Contributor

checkpoint_transaction

[2022-07-29T07:07:41.645Z] [2022/07/29 15:04:00.319 +08:00] [ERROR] [baseconn.go:184] ["execute statement failed"] [task=sequence_sharding_removemeta] [unit=load] [query="CREATE TABLE `t_target` (`id` bigint NOT NULL AUTO_INCREMENT,`uid` int DEFAULT NULL,`name` varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,`c` int DEFAULT NULL,`d` int DEFAULT NULL,`e` int DEFAULT NULL,PRIMARY KEY (`id`),UNIQUE KEY `uid` (`uid`),KEY `c` (`c`),KEY `d` (`d`),KEY `e` (`e`)) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;"] [argument="[]"] [error="Error 1050: Table 't_target' already exists"]

this is an acceptable error, not the root cause. please open an issue

@ti-chi-bot
Copy link
Member

@buchuitoudegou: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@buchuitoudegou
Copy link
Contributor Author

/run-verify

1 similar comment
@buchuitoudegou
Copy link
Contributor Author

/run-verify

@ti-chi-bot ti-chi-bot merged commit 566357e into pingcap:master Jul 29, 2022
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: cannot checkout release-2.0.6: error checking out release-2.0.6: exit status 1. output: error: pathspec 'release-2.0.6' did not match any file(s) known to git

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: cannot checkout release-2.0.7: error checking out release-2.0.7: exit status 1. output: error: pathspec 'release-2.0.7' did not match any file(s) known to git

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jul 29, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6535.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jul 29, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6536.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #6537.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jul 29, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: cannot checkout release-2.0.5: error checking out release-2.0.5: exit status 1. output: error: pathspec 'release-2.0.5' did not match any file(s) known to git

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. needs-cherry-pick-release-2.0.5 needs-cherry-pick-release-2.0.6 needs-cherry-pick-release-2.0.7 needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.0 Should cherry pick this PR to release-6.0 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DM report confusing error message after "too many connection" error
8 participants